| ID | BodyMass |
|---|---|
| 1 | 5085.467 |
| 2 | 4983.132 |
| 3 | 4384.706 |
| 4 | 4773.966 |
| 5 | 5224.501 |
| 6 | 5272.518 |
| 7 | 4467.005 |
| 8 | 4892.681 |
Day 1
Meaning? Depends on who you ask to..
Frequentist: Essentially, the (long-run) relative frequency (or proportion) of an event happening
Bayesian: Essentially, the relative plausibility of an event happening given what we already know about what generates events and what we actually observe (i.e., data)
Which one is best?
NONE
Both are useful
“Frequentist” or “bayesian”..probabilities obey to rules:
Union (mutually exclusive): \(Pr(A \cup B) = Pr(A) + Pr(B)\), if \(Pr(A \cap B) = 0\)
Intersection: \(Pr(A \cap B)\)
Union (not mutually exclusive): \(Pr(A \cup B) = Pr(A) + Pr(B) - Pr(A \cap B)\)
Joint probability: \(Pr(A) \cdot Pr(B)\), if A and B are independent
Independence: \(Pr(A|B)=Pr(A)\) and \(Pr(B|A)=Pr(B)\)
Conditional probability: \(Pr(A|B)=\frac{Pr(A \cap B)}{Pr(B)}\)
\(\cup\): read it like “probability of either A OR B or both occurring”
\(\cap\): read it like “probability of A AND B simultaneously occurring”
Note that, under independence between A and B: 1
\(Pr(A|B)=\frac{Pr(A \cap B)}{Pr(B)}\\Pr(A)\cdot Pr(B)=Pr(A \cap B)\)
While, under lack of independence between A and B: 2
\(Pr(A|B)=\frac{Pr(A \cap B)}{Pr(B)}\\Pr(A|B)\cdot Pr(B)=Pr(A \cap B)\)
Discrete measures 👉 probability
Continuous measures 👉 density
The probability for any specific value of a continuous measure is \(0\)
Densities are related to (but not exactly the same as) probabilities
Both still obey to probability rules: pdf(s) integrate to 1; pmf sum up to 1
Cdf(s) map measures to the probability of them assuming a specific (or lower) value
Usually written as: \(F = Pr(X \leq x)\)
Cdf(s) exist for both continuous and discrete measures
Quantiles are values assumed by a measure that split its pdf (or pmf) in two groups of observations
Example: percentiles split a probability distribution in 100 samples of measures of equal size (and probability)
Example: the median is the 2nd quartile
The Fantastic 4
d*, p*, q*, r*
d*: compute density (cont.) or probability (discr.)
p*: returns \(Pr(measure\leq quantile)\) (mind the tail argument)
q*: returns quantile for a given \(Pr(measure\leq quantile)\) (mind the tail argument)
r*: draw random values of measures from a model
Examples:
Gaussian: dnorm, pnorm, qnorm, rnorm
Binomial: dbinom, pbinom, qbinom, rbinom
Data: information we have available
Model: a set of assumptions to describe a simplified version of reality
Parametric model: Model described by parameters (see pdf(s) and pmf(s))
Probability: how measures behave according to our model
We have data and models, what do we do now?
Let’s use data to estimate model parameters!
| ID | BodyMass |
|---|---|
| 1 | 5085.467 |
| 2 | 4983.132 |
| 3 | 4384.706 |
| 4 | 4773.966 |
| 5 | 5224.501 |
| 6 | 5272.518 |
| 7 | 4467.005 |
| 8 | 4892.681 |
Assumption: Body mass of (all existing) Gentoo’s penguins is normally distributed with some mean and variance
Parametric model: \(Gentoo\hspace{1 mm}body\hspace{1 mm}size \sim \mathcal{N}(\mu,\, \sigma^{2})\)
Maximizing the joint probability of the data|parameters allows finding the parameter(s) that maximize(s) the L of observing the data (under the assumed model)!
Likelihood(parameters|data) = Probability(data|parameters)
Link data, model and L
Data: sample of \(n\) penguins on which we measure BM
Model:
\(BM \sim \mathcal{N}(\mu,\,\sigma^{2})\\\)
Probability (density) for \(BM_i\):
\(f(x) = \frac{1}{\sigma \sqrt{2\pi} } e^{-\frac{1}{2}\left(\frac{BM_i-\mu}{\sigma}\right)^2}\)
L (given model):
\(\prod\limits_{i=1}^{n} \frac{1}{\sigma \sqrt{2\pi} } e^{-\frac{1}{2}\left(\frac{BM_i-\mu}{\sigma}\right)^2}\)
“Move” along combinations of \(\mu\) and \(\sigma^2\) and find those that maximize L
We usually maximize the log-Likelihood (LL) for two main reasons:
\(log(\prod\limits_{i=1}^{n}X_i) = \sum\limits_{i=1}^{n}log(X_i)\)
We will:
Estimate the population mean of Gentoo’s body mass using brute force
Estimate regression parameters for the relationship between Gentoo’s body mass and flipper length (without using brute force)
Estimate rate parameter of a Poisson population
NOW GO TO R..
What I think I am doing
Model:
\(\mu_i = \alpha + \beta \cdot flipper\hspace{1mm}length_i\)
What I am actually doing
Model:
\(Gentoo\hspace{1 mm}body\hspace{1 mm}size_i \sim \mathcal{N}(\mu_i,\, \sigma^{2})\)
\(\mu_i = \alpha + \beta \cdot flipper\hspace{1mm}length_i\)
\(Y \sim Pois(\lambda)\\,with\hspace{1mm} Y\hspace{1mm} assuming \hspace{1mm}value\hspace{1mm} \geq 0\)
Pmf: \(Pr(Y) = \frac{\lambda^Y\exp^{-\lambda}}{Y!}\)
Likelihood function \(\neq\) Pdf
We found the MLE(s). Does this mean that we now know the population parameters? NO!
From the shape of the LL, we can estimate how precisely we estimate population parameters
A re-arrangement of conditional probability:
Conditional probability: \(Pr(A|B)=\frac{Pr(A \cap B)}{Pr(B)}\)
\(Pr(A|B)=\frac{Pr(A \cap B)}{Pr(B)}\)
\(Pr(A|B)Pr(B)=Pr(A \cap B)\)
But \(Pr(A \cap B) = Pr(B \cap A)\)
And \(Pr(B \cap A) = Pr(B|A)Pr(A)\)
So \(Pr(A|B)Pr(B) = Pr(B|A)Pr(A)\)
Dividing both sides by \(Pr(B)\) we end up with:
Bayes’ rule: \(Pr(A|B) = \frac{Pr(B|A)Pr(A)}{Pr(B)}\)
\(Pr\): we are familiar with them (pdf(s), pmf(s))
\(Pr(B|A)\): what if I tell you that \(B\) is data and \(A\) parameters?
WELL DONE! IT’S THE LIKELIHOOD!
\(Pr(A)\): prior..a model for the parameter(s) 🤯